Humans have long relied on visual aids like sketches and diagrams to support reasoning and problem-solving. Visual tools, like auxiliary lines in geometry or graphs in calculus, are essential for understanding complex ideas. However, many tutoring systems remain text-based, providing feedback only through natural language. Leveraging recent advances in Large Multimodal Models (LMMs), this paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. Built on a pre-trained LMM, Interactive Sketchpad is fine-tuned to provide step-by-step guidance in both text and visuals, enabling natural multimodal interaction with the student. Accurate and robust diagrams are generated by incorporating code execution into the reasoning process. User studies conducted on math problems such as geometry, calculus, and trigonometry demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels, highlighting its potential for transforming educational technologies.
Interactive Sketchpad enhances GPT-4o's ability to provide step-by-step, visual hints for problem-solving. Given a student query and problem statement, Interactive Sketchpad generates both textual hints and dynamic visual diagrams, allowing students to engage with the problem iteratively. Without Interactive Sketchpad, GPT-4o struggles to offer effective interactive guidance, frequently revealing the answer and not providing any visual aids, whereas Interactive Sketchpad enables a natural, multimodal learning experience that improves conceptual understanding.
Overview of Interactive Sketchpad: Given a multimodal question, Interactive Sketchpad generates a program to create a visual aid, then uses the visual aid as part of a hint to help the user solve the problem. The visual aid is sent to the interactive whiteboard which the user can write and draw on before sending the annotated diagram back to receive feedback or further help.
@article{chen2025interactive,
author = {Chen, Steven-Shine and Lee, Jimin and Liang, Paul Pu},
title = {Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving},
journal = {arXiv preprint arXiv:2503.16434},
year = {2025},
}